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Abstract 

In this paper we show that for the purposes of dimensionality reduction certain class of 
structured random matrices behave similarly to random Gaussian matrices. This class includes 
several matrices for which matrix-vector multiply can be computed in log-linear time, providing 
efficient dimensionality reduction of general sets. In particular, we show that using such matrices 
any set from high dimensions can be embedded into lower dimensions with near optimal distor¬ 
tion. We obtain our results by connecting dimensionality reduction of any set to dimensionality 
reduction of sparse vectors via a chaining argument. 


1 Introduction 

Dimensionality reduction or sketching is the problem of embedding a set from high-dimensions into 
a low-dimensional space, while preserving certain properties of the original high-dimensional set. 
Such low-dimensional embeddings have found numerous applications in a wide variety of applied 
and theoretical disciplines across science and engineering. 

Perhaps the most fundamental and popular result for dimensionality reduction is the Johnson- 
Lindenstrauss (JL) lemma. This lemma states that any set of of p points in high dimensions can 
be embedded into dimensions, while preserving the Euclidean norm of all points within 

a multiplicative factor between 1 - J and 1 + <5. The Johnson-Lindenstrauss Lemma in its modern 
form can be stated as follows. 

Lemma 1.1 (Johnson-Lindenstrauss Lemma [15]) Let 6 e (0,1) and let xi,X 2 , ■ ■ ■ ,Xp e 

be arbitrary points. Then as long as m = 0(^^^) there exists a matrix A e such that 

(1 -J) \\xi\\^^ < ^ (1 + <^) \\xi\\i ^, (1.1) 


for all i = 1,2,... ,p. 


‘Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley CA 
^Simons Institute for the Theory of Computing, UC Berkeley, Berkeley CA 
^Department of Statistics, UC Berkeley, Berkeley CA 

^Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 


I 



This lemma was originally proven to hold with high probability for a matrix A that projects 
all data points onto a random subspace of dimension m and then scales them by The 

result was later generalized so that A could have i.i.d. normal random entries as well as other 
random ensembles [7, 12]. More recently the focus has been on constructions of the matrix A 
where multiplication by this matrix can be implemented efficiently in terms of time and storage 
e.g. matrices where it takes at most o(n log n) time to implement the multiplication. Please see 
the constructions in [1,10,16,19,20] as well as the more recent papers [2,23] for further details on 
related and improved constructions. 

In many uses of dimensionality reduction such as those arising in statistical learning, optimiza¬ 
tion, numerical linear algebra, etc. embedding a finite set of points is often not sufficient and one 
aims to embed a set containing an infinite continuum of points into lower dimensions while pre¬ 
serving the Euclidean norm of all point up to a multiplicative distortion. A classical result due to 
Gordon [13] characterizes the precise tradeoff between distortion, “size” of the set and the amount 
of reduction in dimension for a subset of the unit sphere. Before stating this result we need the 
definition of the Gaussian width of a set which provides a measure of the “complexity” or “size” 
of a set T. 

Definition 1.2 For a set T c R”', the mean width oj{'T) is defined as 

co{T) = E[supgi’^u]. 
veT 

Here, g e R"' a Gaussian random vector distributed as 


Theorem 1.3 (Gordon’s escape through the mesh) Let S e (0,1), T c R” 6e a subset of the 
unit sphere (T c and let A e be a matrix with i.i.d j\ 7(0, 1/m) entries.^ Then, 


\\Ax\ 


f.2 


Uc 


l2 


< 51 !®! 


12 ’ 


holds for all X €T with probability at least 1 - 2e 2 as long as 


m > 


{uj{T)+rjy 

<52 


( 1 . 2 ) 


(1.3) 


We note that the Johnson-Lindenstrauss lemma for Gaussian matrices follows as a special case. 
Indeed, for a set T containing a finite number of points |T| < p, one can show that uj{T) < \J2 log p 
so that the minimal amount of dimension reduction m allowed by (1.3) is of the same order as 
Lemma 1.1. 

More recently a line of research by Mendelson and collaborators [17,18,21,22] show that the 
inequality (1.2) continues to hold for matrices with i.i.d. sub-Gaussian entries (albeit at a loss in 
terms of the constants). Please also see [9,28] for more recent results and applications. Gonnected 
to this, Bourgain, Dirksen, and Nelson [3] have shown that a similar result to Gordon’s theorem 
continues to hold for certain ensembles of matrices with sparse entries. However, compared to 
Gordon’s result above the allowed reduction in dimension is smaller by constant and logarithmic 
factors and an additional factor that characterizes the “spikiness” of the set 7”. 


^We note that the factor 1/m in the above result is approximate. For the precise result one should replace 1/m with 


i(jll) ) 


l/m where F denotes the Gamma function. 
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This paper develops an analogue of Gordon’s result for more structured matrices particularly 
those that have computationally efficient multiplication. At the heart of our analysis is a theorem 
that shows that matrices that preserve the Euclidean norm of sparse vectors (a.k.a. RIP matrices), 
when multiplied by a random sign pattern preserve the Euclidean norm of any set. Roughly stated, 
linear transforms that provide low distortion embedding of sparse vectors also allow low distortion 
embedding of any set! We believe that our result provides a rigorous justification for replacing 
“slow” Gaussian matrices with “fast” and computationally friendly matrices in many scientific and 
engineering disciplines. Indeed, in a companion paper [24] we utilize our results in this paper to 
develop sharp rates of convergence for various optimization problems involving such matrices. 

2 Isometric sketching of sparse vectors 

To connect isometric sketching of sparse vectors to isometric sketching of general sets, we begin 
by defining the Restricted Isometry Property (RIP). Roughly stated, RIP ensures that a matrix 
preserves the Euclidean norm of sparse vectors up to a multiplicative distortion S. This definition 
immediately implies that RIP matrices can be utilized for isometric sketching of sparse vectors. 

Definition 2.1 (Restricted Isometry Property) A matrix A e satisfies the Restricted 

Isometry Property with distortion 6 > 0 at a sparsity level s, if for all vectors x with sparsity at 
most s, we have 

|||A£c ||^2 - ||a;|||| < max((5,<5^) \\xf^^. (2.1) 

We shall use the short-hand RIP{5, s) to denote this property. 

This definition is essentially identical to the classical definition of RIP [4]. The only difference is 
that we did not restrict 6 to lie in the interval [0,1]. As a result, the correct dependence on 6 in 
the right-hand side of (2.1) is in the form of max(5,5^). Eor the purposes of this paper we need 
a more refined notion of RIP. More specifically, we need RIP to simultaneously hold for different 
sparsity and distortion levels. 

Definition 2.2 (Multiresolution RIP) Let L - [log 2 n]. Given 5 > 0 and a number s > 1, for 
.£ = 0,1,2,... , L, let (6^, si) = (2^/^(5, 2^s) be a sequence of distortion and sparsity levels. We say a 
matrix A e satisfies the Multiresolution Restricted Isometry Property (MRIP) with distortion 

6 > 0 at sparsity s, if for all i € {1,2,..., L}, RIP(5i, s^) holds. More precisely for vectors of sparsity 
at most Si (llicll^Q < si) the sequence of inequalities 

|||Aic||| - ||a;|||| < max(5^,5|) \\x\]^, (2.2) 

simultaneously holds for all £ e {1,2,..., L}. We shall use the short-hand MRIP{5, s) to denote this 
property. 

This definition essentially requires the matrix to satisfy RIP at different scales. At the lowest scale, 
it reduces to the standard RIP(5, s) definition. Noting that sl = 2'^s > n at the highest scale this 
condition requires 

|||A£c ||^2 - ||a;|||| < max((5L,(^i) > 

to hold for all vectors £c e R”. While this condition looks rather abstract at first sight, with proper 
scaling it can be easily satisfied for popular random matrix ensembles used for dimensionality 
reduction. 
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3 Prom isometric sketching of sparse vectors to general sets 


Our main result states that a matrix obeying Multiresolution RIP with the right distortion level S 
can be used for embedding any subset T of R"". 


Theorem 3.1 Let T c R"" and suppose the matrix H e obeys the Multiresolution RIP with 

sparsity and distortion levels 


s= 150(1+ r/) 


and 


6 ■ rad(T) 

Cmax (rad(T), w(T)) ’ 


(3.1) 


with C > 0 an absolute constant. Then, for a diagonal matrix D with an i.i.d. random sign pattern 
on the diagonal, the matrix A = HD obeys 


sup I IIAsll^^ - ||®||£2 I ^ max((l, <I2).(rad(r))^ (3.2) 

tceT 


with probability at least l-exp(-? 7 ). Here, rad(T) = sup^g^-||u||^^ is the maximum Euclidean norm 
of a point inside T. 


This theorem shows that given a matrix that is good for isometric embedding of sparse vectors when 
multiplying its columns by a random sign pattern it becomes suitable for isometric embedding of 
any set! For typical random matrix ensembles that are commonly used for dimensionality reduction 
purposes, given a sparsity s and distortion 5 the minimum dimension m for the MRIP(s,(I) to hold 
grows as m ~ In Theorem 3.1, we have s ~ 1 and 5 ~ so that the minimum dimension m 

for (3.2) to hold is of the order of m ~ This is exactly the same scaling one would obtain 

by using Gaussian random matrices via Gordon’s lemma in (1.3). To see this more clearly we now 
focus on applying Theorem 3.1 to random matrices obtained by subsampling a unitary matrix. 

Definition 3.2 (Subsampled Orthogonal with Random Sign (SORS) matrices) Let F e 

[pnxn orthonormal matrix obeying 

F*F = I and maxlFid < —=. (3.3) 

Ij \/n 


Define the random subsampled matrix H e with i.i.d. rows chosen uniformly at random from 

the rows of F. Now we define the Subsampled Orthogonal with Random Sign (SORS) measurement 
ensemble as A = HD, where D e is a random diagonal matrix with the diagonal entries 

i.i.d. ±1 with equal probability. 

To simplify exposition, in the definition above we have focused on SORS matrices based on sub¬ 
sampled orthonormal matrices H with i.i.d. rows chosen uniformly at random from the rows of 
an orthonormal matrix F obeying (3.3). However, our results continue to hold for SORS matri¬ 
ces defined via a much broader class of random matrices H with i.i.d. rows chosen according to 
a probability measure on Bounded Orthonormal Systems (BOS). Please see [11, Section 12.1] for 
further details on such ensembles. By utilizing results on Restricted Isometry Property of subsam¬ 
pled orthogonal random matrices obeying (3.3) we can show that the Multi-resolution RIP holds 
at the sparsity and distortion levels required by (3.1). Therefore, Theorem 3.1 immediately implies 
a result similar to Gordon’s lemma for SORS matrices. 
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Theorem 3.3 Let T a'PL and suppose A e is selected from the SORS distribution of Defini¬ 
tion 3.2. Then, 


sup I ||A ®||£2 - lla^ll^^ I ^ max{<5, <5^} • (rad(T))^ , 
xeT 

holds with probability at least 1 - 2e~'^ as long as 


m > C'A^(1 + 77)^(logn)^ 


max 



‘^HT) \ 

had(r))V 


<52 


(3.4) 


(3.5) 


As we mentioned earlier while we have stated the result for real valued SORS matrices obeying (3.3), 
the result can be generalized to complex matrices and more broadly to SORS matrices obtained 
from Bounded Orthonormal Systems. We would also like to point out that one can improve the 
dependence on rj and potentially replace a few logn factors with log(cu(7~)) by utilizing improved 
RIP bounds such as [6,8,26]. We note that any future result that reduces log factors in the sample 
complexity of RIP will also automatically improve the lower bound on m in our results. Infact, after 
the first version of this manuscript became available there has been a very interesting reduction of 
log factors by Haviv and Regev in [14]. We believe that utilizing this new RIP result it may be 
possible to improve the lower bound in (3.5) to 


max 1, ) A 

m > C'A^(1+r/)^(loga;(T))^logn - ^ -. (3.6) 

We leave this for future research. 2 

Ignoring constant/logarithmic factors Theorem 3.3 is an exact analogue of Gordon’s lemma for 
Gaussian matrices in terms of the tradeoff between the reduced dimension m and the distortion 
level 5. Gordon’s result for Gaussian matrices has been utilized in numerous problems. Theorem 3.3 
above allows one to replace Gaussian matrices with SORS matrices for such problems. For example, 
Ghandrasekaran et al. [5] use Gordon’s lemma to obtain near optimal sample complexity bounds 
for linear inverse problems involving Gaussian matrices. An immediate application of Theorem 
3.3 implies near optimal sample complexity results using SORS matrices. To the extent of our 
knowledge this is the first sample optimal result using a computational friendly matrix. We refer 
the reader to our companion paper for further detail [24]. 

Theorem 3.3 is the first result to establish an analogue to Gordon’s Theorem that holds for 
all sets T, while using matrices that have fast multiplication. We would like to pause however to 
mention a few interesting results that hold with additional assumptions on the set T. Perhaps, 
the first results of this kind were established for the Restricted Isometry Property in [4,26], where 
the set T is the set of vectors with a certain sparsity level. In [19] Krahmer and Ward established 
a JL type embedding for RIP matrices with columns multiplied by a random sign pattern. That 
is, the authors show that Theorem 3.3 holds when T is a finite point cloud. More recently, in [30] 
the authors show a Gordon type embedding result holds for manifold signals using RIP matrices 

^The reason (3.6) does not follow immediately from the results in [14] is twofold: (1) The results of [14] are based on 
more classical definitions of RIP (without the max((5,<5'^) as in (2.1)) and (2) the dependence on the distortion level 
5 in terms of sample complexity is not of the form IjS^ and has slightly weaker dependence of the form 
which holds for sufficiently small 5. 
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whose columns are multiplied by a random sign pattern. Earlier, we mentioned the very interesting 
result of Bourgain et. al. [3] which establishes a result in the spirit of Theorem 3.3 for sparse 
matrices. However, compared with (3.5), the minimum dimension m [3], in addition to the mean 
width u){T) and distortion <5, also depends on a parameter which characterizes the spikiness of the 
set T. This is of course to be expected as when using sparse ensembles it is not possible to embed 
spiky sets into lower dimension without significant loss in terms of distortion. In addition, the 
authors of [3] also establish results without the spikiness assumption for particular T using Fast 
Johnson-Lindenstrauss (FJLT) matrices e.g. see [3, Section 6.2]. Recently, Pilanci and Wainwright 
in [25] have established a result of similar flavor to Theorem 3.3 but with suboptimal tradeoff 
between the allowed dimension reduction and the complexity of the set T. Roughly stated, this 
result requires m > (logn)^ using a sub-sampled Hadamard matrix combined with a diagonal 
matrix of i.i.d. Rademacher random variables.^ 

4 Proofs 

Before we move to the proof of the main theorem we begin by stating known results on RIP for 
bounded orthogonal systems and show how Theorem 3.3 follows from our main theorem (Theorem 
3.1). 

4.1 Proof of Theorem 3.3 for SORS matrices 

We first state a classical result on RIP originally due to Rudelson and Vershynin [26,29]. We state 
the version in [11] which holds generally for bounded orthogonal systems. We remark that the 
results in [26,29] as well as those of [11] are stated for the regime <5 < 1. However, by going through 
the analysis of these papers carefully one can confirm that our definition of RIP (with max((5, <5^) 
on the right-hand side in lieu of <5) continues to hold for 5 > 1. 

Lemma 4.1 (RIP for sparse signals, [11,26,29]) Let F e denote an orthonormal matrix 
obeying 

F* F = I and maxjTid<-^. (4-1) 

hj vn 

Define the random subsampled matrix H e with i.i.d. rows chosen uniformly at random from 

the rows of F. Then RIP{S,s) holds with probability at least 1 - e~^ for all 6 > 0 as long as 

r,s (log^nlogm + ri) 
m > -H. 

Here C > 0 is a fixed numerical constant. 


®We would like to point out that our proofs also hint at an alternative proof strategy to that of [25] if one is interested 
in establishing m > (logn)^ ‘^ . In particular, one can cover the set T with Euclidean balls of size 5. Based on 

Sudakov’s inequality the logarithm of the size of this cover is at most . One can then relate this cover to a 

cover obtained by using a random pseudo-metric such as the one defined in [26] . As a result one incurs an additional 
factor (logn)'*tJ^(T). Multiplying these two factors leads to the requirement m > (log n)*^ " . 
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Applying the union bound over L = [log re] sparsity levels and using the change of variable rj 
rj + logL, together with the fact that (logre)^ + ry < (1+ r?)(logre)^, Lemma 4.1 immediately leads to 
the following lemma. 

Lemma 4.2 Consider H e distributed as in Lemma 4-1. H obeys multi-resolution RIP with 
sparsity s and distortion (5 > 0 with probability 1 - e~^ as long as 

„>c(i+,)A^d!5!A. 

Theorem 3.3 now follows by using s = C(1 + r]) and d = - / ^ ^ in Theorem 3.1. 

4.2 Connection between JL-embedding and RIP 

A critical tool in our proof is an interesting result due to Krahmer and Ward [19] which shows that 
RIP matrices with columns multiplied by a random sign pattern obey the JL lemma. 

Theorem 4.3 (Discrete JL embedding via RIP, [19]) AssumeT c [R"' is a finite set of points. 
Suppose H € is a matrix satisfying RIP{s,6) with sparsity s and distortion h > 0 obeying 

s < min(40(log(4|T|)+ r/),re) and 0 < 6 < ^, 

where D e is a random diagonal matrix with the diagonal entries i.i.d. ±1 with equal probability. 
Then the matrix A = HD obeys 

^ max(e,e^) ||£c|||, (4.2) 

simultaneously for all x with probability at least 1 - e“^. 

The above theorem differs from the result of Krahmer and Ward [19] in two ways. First, the authors 
state their result for 0 < e < 1. Furthermore, in the right-hand side of (4.2) the authors use e in lieu 
of max(e,e^). However, it is easy to verify that their proof (with essentially no modifications) can 
accommodate the result stated above. 

4.3 Generic chaining related notations and definitions 

Our proof makes use of the generic chaining machinery e.g. see [27]. We gather some of the required 
definitions and notations in this section. Define Nq = 1 and Ni = 2^ for 1>1. 

Definition 4.4 (Admissible sequence, [27]) Given a set T an admissible sequence is an in¬ 
creasing sequence (Ai) of partitions of T such that \Ae\ < Ni. 

As noted by Talagrand, increasing sequence of partitions means that every set of Ai+i is contained 
in a set of Ae and Ae{t) is the unique element of Ae that contains t. Then the 72 functional is 
defined as 

00 

72 (T) = infsup^2^/^rad(A£(t)), 

* e=o 

where the inhmum is taken over all admissible sequences. Let A(, be one such optimal admissible 
sequence. Based on this sequence we have we define the successive covers. 
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Definition 4.5 (successive covers) Using Ai we construct successive covers 7} of T by taking 
the center point point^ of each set of A^. 

Let ee,{v) be the associated distortion of the cover with respect to a point v i.e. e£{v) = 
dist(i>,7£). Then for all e T, the 72 functional obeys 

00 

£2'/2e,(^)<72(r). 

e^o 

It is well known that 72 ( 7 ”) and Gaussian width uj{'T) are of the same order. More precisely, 
for a fixed numerical constant C 


C-^cu(r) < 72 (r)<Ca;(r). 


Given the distortion 5 in the statement of Theorem 3.1 we also define different scales of distortion 

(5o = 5,(11 = 2^/25, ..., 5l = 2 ^/^ S , 


with L = log 2 [u]. 

4.4 Proof of Theorem 3.1 

Without loss of generality we assume that rad(7”) = 1. We begin by noting that the Multi-resolution 
RIP property combined with the powerful JL-embedding result stated in Theorem 4.3 allows for JL 
embedding at different distortion levels. We apply such an argument to successively more refined 
covers of the set 7” and at different distortion scales inside a generic chaining type argument to 
arrive at the proof for an arbitrary (and potentially continuous) set 7”. We should point out that one 
can also follow an alternative approach which leads to the same conclusion. Instead of using multi¬ 
resolution RIP, we could have defined a “multi-resolution embedding property” for the mapping A 
that isometrically maps finite set of points T with a near optimal set cardinality-distortion tradeoff 
at varying levels. One can show that this property also implies isometric embedding of a continuous 
set T. 

We begin by stating a lemma which shows isometric embedding as well as a few other properties 
for points belonging to the refined covers Ti at different distortion levels 61 . The proof of this lemma 
is deferred to Section 4.4.6. 

Lemma 4.6 Suppose H e obeys MRIP(s, |) with distortion level 6 and sparsity s = 150(l+r/). 

Furthermore, let D e a diagonal matrix with a random i.i.d. sign pattern on the diagonal 

and set A = HD. Also let 7^ be successive refinements of the set T from Definition 4-5. Then, with 
probability at least 1 - exp(-r 7 ) the followings identities hold simultaneously for all i = 1,2,L, 

• For all V e Ti-i u 7i u (7i-i - Ti), 

||A^||,^< (1 + 2 ^/ 25 ) 11 ^ 11 ,^. (4.3) 

• For all V {Ti-i-Ti), 

I \\^'^\\‘e 2 ~ I - max^ 2 ^/ 25 ,2^ 52 j • ||t;||2^ . (4.4) 

^The center point of a set is the center of the smallest ball containing that set. 



(4.5) 


• For all u e 7i-i and v €7}- {w}, 

\u* A* Av - u*v\ < max 2^(5^ j \'^\e .2 


With this lemma in place we are ready to prove our main theorem. To this aim given a point a; e 7~, 
for £ = 1,2,..., L let be the closest neighbor of x in Ti- We also define 2^+1 = x. We note that 
depends on x. For ease of presentation we do not make this dependence explicit. We also drop x 
from the distortion term ei{x) and simply use eg. Now observe that for all £ = 1,2,..., L, we have 


\zi-zi.i\\f^^ < \\zi-x\\(^^ + \\zi.i -x\\f^^ < ei + ei-i < 2e£_i. 
II 2 II II 2 


(4.6) 


We are interested in bounding 11| - \x\^^ \ for all x & T. Define L = max(0, [21og2 (|)J)) and 

note that applying the triangular inequality we have 


\\Ax\ 


f.2 


MI 2 I ^ 


\Az 


Xlba 




\Ax\ 


^2 


\Az 


Liba 


+ Uk 


L 

e=i 


\Ax\ 


f.2 


\\Az 




\x\ 


1-2 


'ilba 


III /I II 2 II II 2 I 

+ |ll^^o||^2-No||£2| 


First note that by Lemma 4.6 


ll^^oll^j - Noll^j I < max((5,(5^) Hzoll^j ^ max((5,(f^). 


(4.7) 


Using the above inequality in (4.7) we arrive at 


I A I|2 II ||2 

- Mi, 


e=i 


\Ax\ 




\Az 


LWi, 


Azi_i 

|2 

1^2 

N.-i£l) 

II 2 


2 


®I|£2 - 

Ml 

(.2 

+ max 




(4.8) 


We now proceed by bounding each of the first three terms in (4.8). Before getting into the details 
of these bounds we would like to point out that (4.8), as well as the results presented in Sections 
4.4.1, 4.4.2 and 4.4.3 are derived under the assumption that L < L. Proper modification allows us 
to bound I II - lla^H^^ I sven when L > L. We shall explain this argument in complete detail in 

Section 4.4.4. 


4.4.1 Bounding the first term in (4.8) 

For 1 < b < L, we have 5i = 2^l'^5 < 1 so that m.scx.{5i,5'j) = 5i. Thus, applying Lemma 4.6 together 
with (4.6) we arrive at 

\\\A{zi - Zi.i)\\l^ - \\zi - Zi_i\\l^\<2^l‘^5\\zi- Zi^i\]^ < 2^/2+2^eii, (4.9) 

and 

\{A{zi- Zi.i),Azi.i) - {zi- Zi.i,Zi.i)\ <2^^‘^^^5ei-i. (4.10) 
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The triangular inequality yields 


\Aze\\l^ - \\zi\\l^\ = \\\A{zi - ze-i) + Azi^iWl^ - 

<\\\A{zi - ze.i)\\‘j^ - \\ze-zi^i\\l 
+ 2\{A{zi- Zi_i),Azi_i) - {zi- zi_i,zi_i) \. 


. Ill /( II2 II II2 I 


Combining the latter with (4.9) and (4.10) we arrive at the following recursion 
III - ll^^llll - |||A2 ^_i||| - ||2^-i|||| < (5(2ef_i +4eii) 2^''^. 

Adding both sides of the above inequality for 1 < £ < L, and using < 2ef < 4, we arrive at 

Edll^^^lll - Ndll|-|l|4l2r-i||| - Nr-i||||) <10<51 X; 

i=i \£=i 

=wV2s( X 2^/^ 


(4.11) 


\r=o 

--10V26-f2{T). 


(4.12) 


4.4.2 Bounding the second term in (4.8) 


I A® I 




\Az 


L\\i2 


To this aim first note that 


To bound the second term we begin by bounding 

since MRIP(s, |) holds for H with s = 150(1 + r]) then sl = 150 x 2^(1 + rj) >n. As a result for all 
£c e R” we have 


\\Hxfg^ - \\x\Q < max( ^6l) 


\x\\ 


IQ "' "^2 • 


Using the simple inequality 1 + max((5,5^) < (1 + 5)^, this immediately implies 


IIAll = ||iT|| < -2^6 + 1. 
. 4 


(4.13) 


Furthermore, by the definition of N£ we have ||£c - < ej,. These two inequalities together with 

repeated use of the triangular inequality we have 


I Aa;| 


^2 


\Az 


'L\\l2 


l2 - + \\Azl\\£^ 


Ax 

<||A(a:-2L)||^^ + ||A(zL-2^) 

< ||A|| ||£C -Zi||„ + 


Az 


'L\\i2 


L)\\E2 


^^22 5 + 1 ^ 61 + X I|A(z£-Z£, 


X A{ze-ze_i) 
=L +1 ^2 

L 


e=L+i 
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Using Lemma 4.6 equation (4.3) in the above inequality and noting that for i> L, we have > 1 
we conclude that 


\Ax\ 


l2 


Az 


'L\\i2 






1-2 


£=L+1 




^=L+l 


<-S2^l^eL+^V25 f] 2^^' 


l)/2 


e£-i 


e=L+i 


<4V2S ( 2^/2 


ei 


K£=L 


<4V2Sj2{T). 


(4.14) 


Now note that by Lemma 4.6 equation (4.3) and using the fact that rad(7~) = 1, we know that 
I < 1 + 2^l‘^5 < 2. Thus, using this inequality together with (4.14) we arrive at 


^^\\£2 II"^^lII£2 

< 

11^*11^2 -|I^^lIL2 

||A®||^^ + ||Az^||^2 


< 

11^*11^2- II 

2 

+ 

II^®II£2-II^^lI 


\Az-, 


<?>26‘^-il{T) + sV25-i2{T). 

4.4.3 Bounding the third term in (4.8) 


(4.15) 


Similar to the second term we begin by bounding 
we have 


X « - z 


II £2 


L\\£2 


X « - llZf 


II £2 


Lll£2 


. Noting that 2^^‘^S > 1 for £ > L 
< £ Ijzui - < 2 £ e, < 2 £ 2^lHe£ < 2Sj2{T). 


£=L 


L 

E 

£=L 


L 

E 

£=L 


Thus using this inequality together with the fact that < 1 we arrive at 


X\\£2 - 11^ 


Xlb2 


M£2 

r2 2 


L\\£2\ 


Philya I 


(I 


*11^2 + IP 


+ Ue 


£2 


L\\£2 

z 


) 




<4P72"(T) + 2572(T). 


(4.16) 


4.4.4 Establishing an analog of (4.8) and the bounds (4.12), (4.15), and (4.16) when 
L>L 

This section describes how an analog of (4.8) as well as the subsequent bounds in Sections 4.4.1, 
4.4.2 and 4.4.3 can be derived when L > L. Using similar arguments leading to the derivation of 
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(4.8) we arrive at 


L 

- \M\ I ^ E (lll^^^lll - N^llll - lll^^^-illl - N^-illlI) 

r=i 

- \zl\W + max(5,5^). 


Ill /I II2 II II2 I 

+ |ll^®llr2 - Mh\ 


(4.17) 


The main difference with the L < L case is that we let the summation in the first term go upto L and 
instead of studying the second line of (4.8), we will directly bound the difference ||| ~ 

- INillil in (4.17). 

We now turn our attention to bounding the first two terms in (4.17). For the first term in (4.17) 
an argument identical to the derivation of (4.12) in Section 4.4.1 allows us to conclude 


E (III^^^111 - IIII - IIIIII - N^-ill||) ^ wV26j2{T). 


(4.18) 


To bound the second term in (4.17) note that we have 


\ A II 2 II II 2 I 

l^^llr^-ll^llr^l 


ll-42:L|lr2 - W^L 
|2 


i2 Vll®llr2 Ni'll£2)l’ 


^1(11^® 11^2 -|l^^i|l|)-( 

= \{\\A{X - Zl) + Azl\\% - W Azl \\‘j^)-{\\{x-ZL)+ZL\\%-\\ZL\\%)\, 

= \{\\A{x - zl)\\1^ - \\x-zl\\1^) + 2{{A{x-zl),Azl) - {x-zl,zl))\ 
<\\\A{x - ZL)f^^ - \\x-zl\\1^\ + 2\{A{x-zl),Azl) - {x-zl,zl)\ , 


= \\\A{x - zl)\\ 1^ - \\x-ZLffJ + 2\\x-ZL\ 


<\\\A{x-zl)\1 

- \x 

1 „ 


+ 2 II® "^^11^2 


1 „ 

. / 

+ 2 II® "^^11^2 

4 


U2\ 

II2 I 

11 ^ 2 ! 


^2 


I A . V , X-Zl 


\\X-ZLh^ 


\\X- Zl\ 


,Zl) 


( X-Zl \ 

\\\x-ZL\\e2^V 

\ II®11^2 / 




X-Zl 


\\X - ZlWi, 

X-Zl 


\\X - Zl\\ 


+ ZL 


-Zl 


12 


(4.19) 


To complete our bound note that since MRIP(s,|) holds for A with s = 150(1 + ry) then sl = 
150 X 2^(1 + 77 ) > n. As a result for all m e R” we have 


I|4l«^ll£2-Il'^ll£2 


2 I - ^ ^ X ^ Jv2 \ II II2 

i<max(-(5L,—4^)11-m 11^^. 


For L > L we have 5l = 2^ 6 < 1 which immediately implies that for all m e R” we have 


A^ll - ||»||i| < 12^25 


Now using (4.20) with w = x - zl, 


X-ZL 

Wx-ZL h. 


^2 I 


- zl, and 




1-2 ■ 


(4.20) 


X-ZL 

\\X-ZL II 


+ zl in (4.19) and noting that 
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<rad(T) < 1 , we conclude that 


\ A II2 II II2 I 


\\Azl\\i^ - \\zl\\i^ 




^ - ^liX + 


8 




X-Zl 


\\X-ZL\\i^ 


+ ZL 






X- Zl\ 


t2 


X-Zl 


\\X-ZL\\^^ 

<\2^I‘^5\\x-zl\\ + 2^I‘^5\\x-zl\ 


ZL 


t2 


1.2 


<\2^'^6eL 
3 e 

<-<572(T). 

Plugging (4.18) and (4.21) into (4.17) we arrive at 


(4.21) 


\\Ax\ 


l2 


||s||? I < 16 J 72 (T) +max((5,(5^). 


(4.22) 


4.4.5 Finishing the proof of Theorem 3.1 

To finish off the proof we plug in the bounds from (4.12), (4.15), and (4.16) into (4.8) and use the 
fact that 72 (T) < Coj{'T) for a fixed numerical constant C, to conclude that for L < L we have 

I \\Ax\\l - \\x\\l^ I <10V2(572(r) + 32(5%|(r) + 8\/2572(r) +4(527|(r) + 2572(r) +max((5,(52) 

<3Q6'^CW{T) + 28C6co{T) + max{6, 5^) 

<72 • max {C5uj{T),C‘^5^u?{T)) + max(5, 5^) 

<73 • max {C5 (max(l,a;(T))), C'^5^ (max(l,a;(T)))^) . (4.23) 

Combining this with the fact that (4.22) holds for L > L we can conclude that for all a; e T 

III As 11^^ - llsll^^ I < 73 • max ((7(5 (max(l,a;(T))), C‘^6'^ (max(l,a;(T)))^). (4.24) 

Note that assuming MRIP(s,|) with s = 150(1 + 7 ) we have arrived at (4.24). Applying the 
change of variable 

292(7max(l,a;(T)) ’ 

we can conclude that under the stated assumptions of the theorem for all s e 7~ 

I ||As ||^2 - ||s||^ 2 1 - niax(J, J^), 

completing the proof. Now all that remains is to prove Lemma 4.6. This is the subject of the next 
section. 
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4.4.6 Proof of Lemma 4.6 


For a set M. we define the normalized set Ai = ||^^|— : t; e We shall also define 

Qt = 7i-i u ?£ u (7i - Ti-i) u [{Ti - Ti-i) - Te-i^ u - Ti-i) + Te-i^ ■ 

We will first prove that for i = 1,2,..., L and every v € 

I ||Ar;||^^ - llr’11^21 ^ max<5,2^<5^j , (4.25) 


holds with probability at least 1 - e~^. We then explain how the other identities follow from this 
result. To this aim, note that that by the assumptions of the lemma MRIP(s, |) holds for the matrix 
H with s = 150(1 + 7 ]). By definition this is equivalent to RIP(s£,6 ^) holding for i= 1,2,... ,L with 
{s£, = (2^s, ^-^). Now observe that the number of entries of obeys IQ^I < 5iV^ with = 2^ 

which implies 

Si =2^s 

=2^(150 + 150r/) 

>2' (40(log2)(logj(20) + 1) + l(t, + 1)) 

>2' (40(log 2) (^25^ + l) + + 1)) 

>40(log 2) (log2(20) + 2^) + f(r/ + 1) 

>401og(4|Qf|) +f (?7 + 1) 

>min(401og (4|Q£|) +f(r 7 + l),n). (4.26) 

By the MRIP assumption, RIP(s£, ^) holds for H. This together with (4.26) allows us to apply 
Theorem 4.3 to conclude that for each (. = 1,2,... ,L and every x € Qi 

|||Aa;||^2 “ ll^llfal - max(5£,5|) \\x\\l^, 


holds with probability at least 1 - e . Noting that 


Ee 


-e(v+i) 


£=1 


-£(ri+l) 


-(47+1) 


-(77+1) 


<e-\ 


completes the proof of (4.25) by the union bound. 

We note that since Te-i u 7^ u {Ti - Ti-i) c Q^, (4.25) immediately implies (4.4). The proof of 
(4.3) follows from the proof of (4.4) by noting that 

(1 + 5i)^ > 1 + max(5£,5|). 


To prove (4.5), first note that 1 ^ 
Hence, applying (4.25) 



£ (T, - T,-.) - T,., and ^ E {T, - T,.,) a T,.,. 


2 

< max(5£, 5|) 

U V 

+ 

t 2 


M £2 M £2 

2 




< max{ 6 i, dg) 

V u 

^2 


\Mi 2 
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Summing these two identities and applying the triangular inequality we conclude that 


-r-r-r- \u* A* Av - U* V 


1 

< — max 
4 




u 



■U 


t2 


(■2 


V 




max((5£, df), 


completing the proof of (4.5). 
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